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The large-scale properties of chemical reaction systems, such as the metabolism, can be studied with graph-based 
methods. To do this, one needs to reduce the information — lists of chemical reactions — available in databases. 
Even for the simplest type of graph representation, this reduction can be done in several ways. We investigate 
different simple network representations by testing how well they encode information about one biologically im- 
portant network structure — network modularity (the propensity for edges to be cluster into dense groups that 
are sparsely connected between each other). To reach this goal, we design a model of reaction-systems where 
network modularity can be controlled and measure how well the reduction to simple graphs capture the modular 
structure of the model reaction system. We find that the network types that best capture the modular structure 
of the reaction system are substrate-product networks (where substrates are linked to products of a reaction) 
and substance networks (with edges between all substances participating in a reaction). Furthermore, we argue 
that the proposed model for reaction systems with tunable clustering is a general framework for studies of how 
reaction-systems are affected by modularity. To this end, we investigate statistical properties of the model and 
find, among other things, that it recreate coiTclations between degree and mass of the molecules. 
Keywords: Chemical Networks, Complex Networks, Chemical Reaction Systems, Statistical Graph Meth- 
ods 



I. INTRODUCTION 

Metabolism, the set of chemical processes sustaining life 
in an organism, is a well-studied example of a chemical re- 
action system. Such systems are fundamental structures at 
many scales in both living organisms and the rest of the uni- 
verse. From the reactions in planetary atmospheres, to the 
geochemical processes under the surface of our planet, and 
the mentioned metabolism. One can even include nuclear 
reactions, occurring in stars and planetary interiors, in the 
same framework (in this paper, however, we will use chem- 
ical terminology). Chemical reaction systems can be mod- 
elled and analyzed at different levels. At a detailed level, one 
can study the dynamics of the system with differential equa- 
tions. Such an approach is fruitful for modelling a relatively 
independent subsystem, a module (Del Vecchio 2008), such 
as e.g. the citric acid cycle. In a simple model at this level 
of description using mass-action kinetics, reactions are de- 
scribed by concentrations of the reactants, catalysts and re- 
action coefficients. The number of parameters (the reaction 
coefficients) scales like the number of reactions. For real sys- 
tems, many of these parameters are unknown. (In more elab- 
orate models, including e.g. temperature dependence, the sit- 
uation gets even more intricate.) The complexity of modeling 
a large metabolic system in such a framework is staggering. 
The approach we take disregards all the reaction coefficients 
and reduces the system to a graph. Such a reduction can be 
done in several ways. Typically, in more elaborate graph rep- 
resentations (including directed edges, separating reactions, 
enzymes and substances, etc.) one can encode more informa- 
tion from the original system, but few general graph methods 
apply, so one would have to construct new ones. For simpler 
graph representations, more information is lost in the reduc- 
tion from the reaction system to the graph, but a vast number 
of analysis methods are applicable. In this work we choose 
the second approach and study a very simple class of graphs, 
conveniently termed simple graphs — unweighted, undirected 



graphs without multiple edges or self-edges. Even in such a 
framework, one can construct graphs from reaction systems in 
many ways. In Fig.[T] we define four types of simple graphs: 
substrate-product graphs connecting the substances reacting 
(the substrates) with the products of the reaction, substrate- 
substrate graphs linking molecules that can react with each 
others, or are products of the same reaction, substance graphs 
connecting all substances participating in a reaction, and re- 
action graphs where the vertices (nodes) are reactions and 
an edge represents a pair of reactions that have a common 
substance. These different graph representations accentuate 
different aspects of the reaction system. For different ques- 
tions about the system, one may need different graph repre- 
sentations. However, for all these representations information 
is lost in the conversion process. This is also true for sev- 
eral types of more sophisticated representations (with different 
types of vertices for substances and reactions, directed edges, 
and so on). The reason is that, with respect to the reaction 
dynamics, the edges are not independent. In order for mass 
to flow along an edge in a substrate-product graph, molecules 
of other vertices (than the two making up the edge) need to 
be present. Of course, more complex graph types can em- 
body more information about the original system than simple 
graphs can. The main reason for studying simple graphs is 
the vast number of analysis tools that can be applied without 
modification. 

One of the main conclusions from graph-based studies of 
metabolic networks is that they have a modular structure — 
they can be divided into subgraphs that are more densely con- 
nected within, than between, each other (Huss & Holme 2007, 
Zhao et al. 2007, Zhao et al. 2006, Goelzer et al. 2008). As- 
suming that the edges of the metabolic network contribute to 
more or less the same degree to the dynamics of the chemi- 
cal system, these network modules should be subsystems (of 
the dynamic reaction system) with some degree of autonomy. 
This is close to the general idea of biological modularity — 
a biological module is commonly defined as a subsystem per- 
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FIG. 1 An illustration of different network representations de- 
rived from the two hypothetical reactions shown in (a), (b) shows 
a substrate-product network, (c) illustrates a substrate-substrate 
network, (d) represents a substance network (including both the 
substrate-product and substrate-substrate type edges), (e) shows a 
reaction network where the vertices are reactions connected if they 
have a substance in common. 



forming some specific, rather well defined, biological function 
(Del Vecchio 2008, Han et al. 2004, Ihmels et al. 2002, Kitano 
2004). One can of course think of modules of non-biological 
reaction systems as well. Biological modularity is a dynamic 
concept that lack a precise, general definition and the link be- 
tween biological and network modularity is a vast research 
question, beyond the full scope of this paper. Nevertheless, 
to be able to group and categorize modules in a biologically 
relevant way is important for our understanding of the biolog- 
ical organizations, with implications from more applied issues 
(like drug design) to more fundamental (like evolution). One 
reasonable requirement for choosing a network representation 
is that they should preserve the network modularity as much 
as possible. In this paper, we will use this criterion, and a 
model of chemical reaction systems with a tunable network 
modularity, to investigate the above mentioned graph repre- 
sentations. 

To outline the paper, we will start by defining the model for 
modular reaction systems, discuss how the network modules 
can be identified in simple graphs, and finally investigate how 
well the different graph representations can preserve the infor- 
mation about the modules as well as other statistical properties 
of the model. 



II. DEFINITIONS AND MODEL 
A. The Model 

In this section, we present the model for modular chemical 
reaction systems mathematically. To do this we need to start 
by introducing a few notations. Let X be a set of atoms 
(in case of nuclear reaction systems "atoms" would mean ele- 
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FIG. 2 An illustration of the model. The circles represent atoms, 
grouped into molecules. Triangles represent reactions. Solid lines 
link substrates with reactions, while dashed lines indicate prod- 
ucts. Numbers on the lines indicate the multiplicities fij of the sub- 
stances in the reactions. Starting from the configuration in (a), two 
molecules, Ji and S2 in (b), are picked at random. From these, 
two other molecules, pi and p2 and multiplicities yU] and yU2, are 
generated in such a way that the mass is conserved in the reaction 
piSi + IU2S2 — > ri + r2. Since p2 already is in the set of metabolites 
the resulting reaction system, from adding this reaction, looks like 
(c). An aspect of the model not illustrated here is that the algorithm 
favors products already present in the substance set. In the program 
we treat reactions as bidirectional, but the structure of the algorithm 
identical if they are directional. 



mentary particles). A molecule cr e Z is a set of atoms and can 
be represented as a vector cr - {m\{<j), ■ ■ ■ , mj^[{cr)). mxicf) is 
the multiplicity of atom x in the molecule. A reaction is a pair 
of sets of molecules {S, P), where S is called substrates and P 
products. Msicr) denotes the multiplicity of cr in 5. A set of 
reactions is called a reaction system. 

What are the constraints on the formation of chemical re- 
action systems? Arguably, the most fundamental restriction is 
law of mass conservation, stating that the multiplicities of all 
substances are the same in both S and P — Ms (cr) - Mpicr) 
for all reactions and all cr e £. This means that no atoms 
are lost, or created, during chemical reactions. A second 
important constraint of chemical reactions is the stability of 
molecules — even if, say, an 04-molecule could form by two 
oxygen molecules colliding, it is such a transient state that 
it is meaningless to count as a member of a reaction system. 
A third major factor shaping the reaction systems of the real 
world is the reaction dynamics itself. The probability of a 
hypothetical reaction A -t- B — > C H- D does not only de- 
pend on the presence of A, B, C and D; it also depends on the 
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free energy change of the reaction, and the presence of cat- 
alysts (or enzymes, in a biological context) or inhibitors that 
can lower, or raise, the activation barrier, and thus affect the 
probability of a reaction. Our model will obey mass conser- 
vation, but we will keep it on an abstract and general level 
and not map the atoms X to real atoms. Thus, the stability of 
molecules will not be an explicit factor in the model. In this 
study, reaction coefficients are not assigned to the reactions — 
if needed, this could be done with an auxiliary model, maybe 
including correlations between the molecular mass and reac- 
tion coefficients. Furthermore, to keep the model as simple 
as possible we will not attempt to model other features than 
modularity, observed in reaction systems. One reason is that 
different kinds of reaction systems may have different network 
structure. Another reason is that some of these structures still 
are not completely characterized — the degree distribution, 
for example, is broad, but not strictly power-law (Holme & 
Huss 2008) (in fact, for the most common network represen- 
tations, it is too broad to follow a power-law distribution). A 
third reason not to include other structure than network mod- 
ularity is that it our model can easily be extended to include 
this. 

To sketch the algorithm, let Na be the number of atoms 
in the system, let Nr be the number of reactions, let g be 
the number of groups (potential network modules). The al- 
gorithm starts by going through all groups adding jNnlg re- 
actions within each group. After that, the remaining (1 - y)NR 
reactions are added to tie the different modules together. If 
the parameter y is large (close to 1), many of the reactions 
will take place within a group, giving the group high network 
modularity. In the remainder of this section, we will describe 
the details of these reaction addition procedures. 

Consider the group / (1 < ; < g). We assign the atoms 
(/ - V^N^Ig + 1, ■ ■ ■ ANAlg to this group and, furthermore, 
add the NAlg possible single-atom molecules to the set of /'s 
molecules S,. After this, iterate the following yNu/g times: 

1. Pick two random substances and S2 from S,-. 

2. Assign reaction multiplicities jji and //2 to these (we 
choose si and 52 at random, with equal probability, 
from the interval [1, ■ ■ ■ ,/Vmax])- 

3. Now, construct two potential molecules pi and p2 by, 
for each atom x, assign a random fraction of the sum of 
multiplicities jjiin^isi) + fiimxisx) to pi and the rest to 
P2 (to assure mass conservation). 

4. If pi + p2, Pi, Pi e 2;, both pi and p2 are different from 
s\ and S2, and the reaction p\S\ +P.2S2 * — > Pi + Pi does 
not exist in the set of reactions of group / /?,, then add 
this reaction to /?,-. Otherwise go to step[T] 

5. If the algorithm has been iterated to step |4] «tiiai times 
without the condition that pi and p2 should be in S, ful- 
filled, then make another iteration from step [T] and add 
these molecules to Z, and the reaction p.isi + piS2 < — > 
pi + pi to Ri. 

The purpose of the Wtriai iterations is to control how dense a 
group is. The larger «triai is, the more reactions will each 



molecule be involved in, i.e. the denser the group will be (as 
larger ntnai means that fewer molecules have to be added to 
S,). The reactions are assumed bidirectional, but this can triv- 
ially be changed to directed reactions. To avoid the reactions 
being too unbalanced with respect to mass, the jUmax parameter 
should be rather low (one could also think of assigning mul- 
tiplicities to the product side, but to keep the model as simple 
as possible we do not do this. We also require there to be 
exactly two molecular species in both S and P. This type of 
reaction is, by far the most common, making up 45% of all hu- 
man metabolic reactions as studied in Holme & Huss (2008) 
(the second most common order is two substrates and three 
products, constituting 14% of the reactions) or 77% of the re- 
actions in the Earth's atmosphere (Sole & Munteanu 2004). It 
is trivial to extend the model to sample reactions according to 
some distribution, empirical or not, of their order An illus- 
tration of these steps (except the requirement that pi and p2 
should be in Z, to break the iterations) of the algorithm can be 
seen in Fig.|2] 

To add the cross-group reactions, we do as the steps above, 
only that we (in step[T]) select substances from the entire set of 
present substances, not only within one group. Furthermore 
we skip step|4](as it will be very unlikely to find pi and p2 in 
S in this case). 

All in all, our model has six parameters: the number of 
groups g, the number of atoms A^^^, the number of reactions 
Nr, the fraction of reactions within the group y, the maximum 
reaction multiplicity fi^ax, and the number of trials to find re- 
actions connecting already existing substances ntriai- 



B. Network modularity 

We will briefly mention how we calculate network mod- 
ularity. For a more in-depth account, see Newman (2006). 
Consider a partition (division) of the vertex set into groups 
and let denote the fraction of edges between group / and j. 
The modularity of this partition is defined as 



(1) 



where the sum is over all groups of vertices. The term 

(2 j ^ij) is the expectation value of e„ in a random multigraph. 

A prototype measure for the modularity QiG) of a graph G is 
Q maximized over all partitions, of all sizes. For metabolic 
networks, it is a standard practice to regard degrees as intrin- 
sic properties of the vertices, and therefore measure network 
structure against a null-model 0{G) — random graphs with 
the (only) constraint that the set of degrees is the same as in 
G. We also do this, and take 



A(G) = Q(G) - {Q(G')}c'eg(.c), 



(2) 



where angular brackets denote average over @(G), as our mea- 
sure of network modularity (Holme & Huss 2006). We use a 
random rewiring of the original graph to sample Q{G) (Maslov 
& Sneppen 2002), and the heuristics proposed in Newman 
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(2006) to calculate Q. We note that it is notoriously difficult to 
compare modularity of networks of different sizes (Kumpula 
et al. 2007), but, in this case, when the networks come from a 
series from the same network being continuously reduced, this 
can at most shift the peak of maximum A marginally (Holme 
& Huss 2006). 

We note that there are several other ways, apart from max- 
imizing Eq. [T] of dividing networks into clusters. We men- 
tion the spectral methods starting from Pothen et al. (1990), 
and information-theory approaches, for instance, Rosvall & 
Bergstrom (2007) and Ziv et al. (2005), asking how the graph 
can best be represented as a coarse-grained graph where ver- 
tices are the clusters of the original graph in such a way that as 
little information as possible is lost in the process. (For even 
more graph clustering methods, see the references of New- 
man, 2006, and Rosvall & Bergstrom, 2007.) The benefit of 
2-maximization, as we see it, is that it is close to the notion 
of clusters being densely connected within and sparsely con- 
nected between each others. A cluster identified with this al- 
gorithm as the property that it cannot be divided in any way 
that would increase the g-modularity. This give the somewhat 
peculiar feature that very sparse regions of the graph can be 
considered a cluster even if they are fragmented. For most 
real-world networks such regions are small and so is this ef- 
fect. One common way of validating these clustering schemes 
is to start from a network of a number of separated cliques 
(fully connected subgraphs), rewire a fraction / of the edges 
and measure the overlap between the detected clusters and the 
original cliques (Girvan & Newman, 2002). Many of the pro- 
posed network clustering methods perform well in this test, 
which also mean they are hkely to perform quite similar for 
the purpose of this work. 

In metabolic networks, it is a common practice to delete 
currency metabolites — abundant substances with a high de- 
gree that tie together modules and thereby blur the modu- 
lar network structure (Huss & Holme 2007, Holme & Huss 
2008). In our model, the new vertices introduced with the 
inter-module reactions do blur the modular structure, but they 
do not have particularly high degrees. To keep the analysis 
of the model simple, we do not include currency-metabolite 
identification. 



III. NUMERICAL RESULTS 

A. An example output 

In Fig.[3f a), we display an example output of our algorithm, 
represented as a bipartite network where edges connect reac- 
tions (triangles) with the substances (circles or squares). The 
parameter j controlling the network modularity (y = 0.90) 
is rather high, giving a visually clear modular structure. The 
additional molecules introduced during the addition of inter- 
module reactions are indicated by squares. Fig.|3jb) shows the 
resulting substance network. The modular structure of the bi- 
partite representation in Fig.[3|a) seems to be retained. From 
this substrate network, we run an algorithm detecting network 
modules (Newman 2006). This algorithm finds as many clus- 



ters as the original network, but misclassifies ~ 2% of the ver- 
tices (the additional vertices for the inter-module reactions are 
not counted), see Fig.[3jc). In Fig.|3jd) we display the output 
from running the modularity detection algorithm on the reac- 
tion network and assigning the identity of a module (in the 
set of reactions) to all vertices participating in this reaction. 
In metabolic databases, one metabolite can be assigned many 
functions; in the spirit of such multifunctionality, this kind of 
categorization might seem reasonable. (Note that the under- 
lying network displayed in Fig. [3jd) is, for comparison, the 
same as the substrate network of panels (b) and (c).) 



B. Network modularity 

y is the model parameter intended for tuning the network 
modularity of the reaction system. It is, therefore, highly de- 
sirable that the relative modularity A should be a monotonous 
function of y for sensible network representations and param- 
eter values. In Fig. Qa) we plot A as a function of y for all 
types of networks discussed, and find such a monotonous re- 
lationship. Preferably, one would like the derived networks to 
span a large range of A-values. All three types of networks 
where the vertices are substances perform comparatively well 
in this respect, whereas the reaction network seems a little 
worse (even if the actual A-values are a larger). 

When we tune y, we keep size of the reaction system — 
the number of reactions and their order (number of substrates 
and products) — constant. However, the size of the derived 
network can be y-dependent, which complicates the analysis 
of the network modularity a little. As seen in Fig. |4|b) and 
(c), except the reaction network, the networks get smaller and 
denser as y is increased. The reaction network has (by con- 
struction) vertices constantly. Even if A = represents a 
neutral modularity with respect to the null model, one needs 
to be careful when comparingA for networks of different sizes 
and average degrees. In Fig. ffld) we plot the two terms, QiG) 
(the maximal modularity of the network) and {Q{G'))c'eg(c) 
(the average maximal modularity of Q{G)), subtracted in the 
definition of A for the substance networks. The decrease of the 
second term as a function of y means that smaller and denser 
networks have lower network modularity Q. If this were true 
for the networks derived from our reaction-system model too; 
then, if we choose the size of the reaction system such that 
the size and average degree of the derived networks are in- 
dependent of y, then the trend of growing Q would be even 
stronger This indicates that y works as a tuning parameter for 
network modularity (in the substance network) whether one 
keeps the size of the original reaction-system or the derived 
networks fixed. The same conclusion can be drawn for all the 
other three types of networks. 



C. Module predictability 

As mentioned above, when y is large, and the network is 
reasonably small, one can spot the modules by inspection, like 
in Fig.[3ja). By applying a network-clustering method one can 
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FIG. 3 An example of a network realization with model parameters g = 8, A^i = 32, = 175, y = 0.9, yu„,i,x = 4 and «„.i;j = 20. (a) shows 
a representation of the reaction system as a bipartite network where one edge type represents reactions and the other type represents chemical 
substances, (b) shows the corresponding substance network. Different colors represent different groups in the construction algorithm. Vertices 
created in the addition of a fraction y of long-range reactions are symbolized by squares, (c) shows the modules detected from the substance 
network in (b). (d) shows the modules identified by first running the network-clustering algorithm on the reaction network, then assigning the 
same cluster identity to all vertices taking part in a reaction of a specific network cluster. In this case one vertex may belong to several network 
clusters. The underlying (white) network in (d) is, for comparison, the same as in (c). 



then recover the group structure as in Fig.|3jc). One desirable 
property of a network representation of a reaction system is 
that this procedure is accurate. In other words, the original, 
modular structure should be recovered reasonably well by the 
clustering algorithm in the presence of the noise created by 
the inter-group reactions. We measure this noise-tolerance 
of the network representations, or module predictability (to 
stress another angle of the issue), by the fraction of overlap- 
ping group identities in the best matching between the orig- 
inal group identities, and the identities detected by the clus- 
tering algorithm. In other words, let jc, be vertex i's group 
in the original reaction-system construction (x, e [1, ■ • • ,g]) 
and yi be vertex i's identity from the graph clustering algo- 
rithm (yi e [1, ■ ■ ■ h is the number of detected groups). 
Then choose a labeling of the graph-clustering groups such 
that each group has a unique number in the interval [1, ■ ■ ■ , /i], 
and that the number «match of vertices / with x,- = y, is as large 
as possible. Then we define the predictability A = nmatch/n^?, 
where Ug is the number of vertices except the veitices in- 
troduced while adding inter-module reactions. We calculate 
nmatch by a simple heuristic: 

1. Start with a random labeling of the groups. 

2. Select a pair of group labels. 

3. If Mmatch does not decrease if these labels are swapped, 
then swap them. 



4. If no improvement has been made during the last «rep 
steps, go to step[2] 



5. Start over from step [T] with a new random seed unless a 
new highest Mmatch has been found in step|4]the last A^rep 
time steps. 



For our small system (h is rarely larger than 20), choosing 
A^rep and Wi-ep between 100 and lO'* gives the same results. 
In Fig. |5] we display the result for the three network types 
with vertices being chemical substances (we do not include 
the reaction network, although that could, with a few addi- 
tional assumptions, also be done). The matching is closest for 
the substrate-product and substance networks. This observa- 
tion is in line with the conclusions of a study of metabolic 
networks (Holme & Huss 2008) where these networks gave 
the most plausible set of currency metabolites and best func- 
tional overlap. Finally, we note that one major factor in the 
decrease of /I as y is lowered from one, is that the network- 
clustering algorithm identifies too many groups (going up to 
more than twice the value of g). It might be the case that 
another network-clustering method, being more restrictive in 
the number of detected molecules, can give higher /l-values. 
In other words, the information of the original graph structure 
might decay slower with decreasing y than indicated by Fig.|5] 
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FIG. 4 Statistics of the model as a function of tfie parameter y. 
The other model parameter values are g = 10, = 50, Nf. = 500, 



= 4 and 



100. The points are averaged over 100 net- 



work realizations. Standard errors are smaller than the symbols, and 
therefore not shown, (a) shows the relative modularity A of the net- 
works as a function of y. (b) shows the average number of molecular 
species for substrate-product networks (the number is, by construc- 
tion, the same for substrate-substrate and substance networks), while 
it is exactly A',; for reaction networks, (c) shows the average degree 
and (d) displays the two different terms in the calculation of A for the 
substance networks. 
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FIG. 5 The predictability A as a. function of y. A is the fraction of 
correctly predicted group identities in matchings between the origi- 
nal group identities in the algorithm and the clusters identified by a 
network-clustering algorithm and the original groups in the network 
constructions. The parameter values are the same as in Fig.|4] 
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FIG. 6 Correlation between molecular weight and degree in net- 
works derived from real and model reaction systems. The points 
are averages for logarithmic bins, (a) shows the data from the human 
metabolic network from the KEGG database, (b) shows the corre- 
sponding plot for our model. The mass of all atoms are set to unity 
in this case (so mass is just a count of the number of atoms constitut- 
ing the molecule). In this figure y = 0.94 and the points are averages 
over 20 networks. Other parameter values are the same as in Fig.|4] 



data from the KEGG database (same data as used in Holme 
& Huss 2008). There is a decaying trend in this relationship 
(not clear enough to talk of a scaling law, but anyway). In 
Fig. |6|b) we show the corresponding plot for our model re- 
action system (all network representations except the reaction 
networks, since the mass of a reaction is somewhat hard to 
define). For our model, we assume all molecules have the 
same weight. Neither the degree, nor the mass have such large 
ranges of values in the model network as in the real network. 
The functional form of the model curves is not matching the 
empirical curves; however, there is however a clear trend of 
decay in this data as well. Possibly, the stoichiometric con- 
straints of the model are enough to explain this correlation in 
real reaction systems. More work is needed to confirm this 
conjecture; we leave it as a speculation in this paper 



IV. DISCUSSION AND CONCLUSIONS 



D. The relation between molecular mass and degree 

As a final analysis of the reaction-system model we include 
our explicit representation of atoms. One situation where this 
aspect of our model can prove useful is to explain the relation 
between molecular mass and degree. In Fig.|6]^a) we plot the 
degree as a function of molecular mass for human metabolic 



We have investigated four types of network representations 
of chemical reaction systems. Of these, substrate-product and 
substance networks are the ones that encode the information 
about the modularity best. This is in line with observations 
that these two network representations are the ones that best 
capture the functional organization of metabolism (Holme & 
Huss 2008). To reach this conclusion, we presented a scheme 
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to generate chemical reaction systems such that the network 
modularity of the derived networks can be tuned by an input 
parameter. The model contains molecules with atoms mod- 
eled expUcitly and reactions ensured to conserve mass. The 
only other structure that is embedded is network modularity. 
One can use the model to study the response of various chem- 
ical phenomena to network modularity. It can also be a base 
for more elaborate modeling, including e.g. the highly skewed 
degree distributions of both metabolic networks and those de- 
rived from reaction systems in planetary atmospheres. Ex- 
cept some basic stoichiometric restrictions (included via the 
explicit modeling of the atoms), the model have very few con- 
straints; this, however, can cause observable effects in the 
network topology. One such effect that we find is that our 
model displays a negative correlation between molecular mass 
and degree, a feature that also can be observed in real-world 
metabolic networks. 

In future studies, in addition to making the model more ac- 
curate as a generative model of real reaction systems, it would 
be interesting to modify it to a model of the evolution of chem- 
ical reaction systems (Furusawa & Kaneko 2006). In such an 
approach it would be desirable that modularity emerges from 
the evolutionary dynamics (Gronlund & Holme 2004) rather 
than being imposed by the construction algorithm. Another 
direction to use a more dynamic interpretation of modularity 
(Han et al. 2004, Gallos et al. 2007) and find a generative 
model for such systems. 
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